智能论文笔记

Esports Data-to-commentary Generation on Large-scale Data-to-text Dataset

Zihan Wang , Naoki Yoshinaga

分类：自然语言处理

2022-12-21

Esports, a sports competition using video games, has become one of the most important sporting events in recent years. Although the amount of esports data is increasing than ever, only a small fraction of those data accompanies text commentaries for the audience to retrieve and understand the plays. Therefore, in this study, we introduce a task of generating game commentaries from structured data records to address the problem. We first build a large-scale esports data-to-text dataset using structured data and commentaries from a popular esports game, League of Legends. On this dataset, we devise several data preprocessing methods including linearization and data splitting to augment its quality. We then introduce several baseline encoder-decoder models and propose a hierarchical model to generate game commentaries. Considering the characteristics of esports commentaries, we design evaluation metrics including three aspects of the output: correctness, fluency, and strategic depth. Experimental results on our large-scale esports dataset confirmed the advantage of the hierarchical model, and the results revealed several challenges of this novel task.

translated by 谷歌翻译

Reconstructing Training Data from Model Gradient, Provably

Zihan Wang , Jason Lee , Qi Lei

分类：机器学习 | (统计)机器学习

2022-12-07

Understanding when and how much a model gradient leaks information about the training sample is an important question in privacy. In this paper, we present a surprising result: even without training or memorizing the data, we can fully reconstruct the training samples from a single gradient query at a randomly chosen parameter value. We prove the identifiability of the training data under mild conditions: with shallow or deep neural networks and a wide range of activation functions. We also present a statistically and computationally efficient algorithm based on tensor decomposition to reconstruct the training data. As a provable attack that reveals sensitive training data, our findings suggest potential severe threats to privacy, especially in federated learning.

translated by 谷歌翻译

Imperceptible Adversarial Attack via Invertible Neural Networks

Zihan Chen , Ziyue Wang , Junjie Huang , Wentao Zhao , Xiao Liu , Dejian Guan

分类：计算机视觉

2022-11-28

Adding perturbations via utilizing auxiliary gradient information or discarding existing details of the benign images are two common approaches for generating adversarial examples. Though visual imperceptibility is the desired property of adversarial examples, conventional adversarial attacks still generate traceable adversarial perturbations. In this paper, we introduce a novel Adversarial Attack via Invertible Neural Networks (AdvINN) method to produce robust and imperceptible adversarial examples. Specifically, AdvINN fully takes advantage of the information preservation property of Invertible Neural Networks and thereby generates adversarial examples by simultaneously adding class-specific semantic information of the target class and dropping discriminant information of the original class. Extensive experiments on CIFAR-10, CIFAR-100, and ImageNet-1K demonstrate that the proposed AdvINN method can produce less imperceptible adversarial images than the state-of-the-art methods and AdvINN yields more robust adversarial examples with high confidence compared to other adversarial attacks.

translated by 谷歌翻译

Cross-Domain Local Characteristic Enhanced Deepfake Video Detection

Zihan Liu , Hanyi Wang , Shilin Wang

分类：计算机视觉

2022-11-07

As ultra-realistic face forgery techniques emerge, deepfake detection has attracted increasing attention due to security concerns. Many detectors cannot achieve accurate results when detecting unseen manipulations despite excellent performance on known forgeries. In this paper, we are motivated by the observation that the discrepancies between real and fake videos are extremely subtle and localized, and inconsistencies or irregularities can exist in some critical facial regions across various information domains. To this end, we propose a novel pipeline, Cross-Domain Local Forensics (XDLF), for more general deepfake video detection. In the proposed pipeline, a specialized framework is presented to simultaneously exploit local forgery patterns from space, frequency, and time domains, thus learning cross-domain features to detect forgeries. Moreover, the framework leverages four high-level forgery-sensitive local regions of a human face to guide the model to enhance subtle artifacts and localize potential anomalies. Extensive experiments on several benchmark datasets demonstrate the impressive performance of our method, and we achieve superiority over several state-of-the-art methods on cross-dataset generalization. We also examined the factors that contribute to its performance through ablations, which suggests that exploiting cross-domain local characteristics is a noteworthy direction for developing more general deepfake detectors.

translated by 谷歌翻译

DAD vision: opto-electronic co-designed computer vision with division adjoint method

Zihan Zang , Haoqiang Wang , Yunpeng Xu

分类：计算机视觉

2022-11-04

The miniaturization and mobility of computer vision systems are limited by the heavy computational burden and the size of optical lenses. Here, we propose to use a ultra-thin diffractive optical element to implement passive optical convolution. A division adjoint opto-electronic co-design method is also proposed. In our simulation experiments, the first few convolutional layers of the neural network can be replaced by optical convolution in a classification task on the CIFAR-10 dataset with no power consumption, while similar performance can be obtained.

translated by 谷歌翻译

SSD: Towards Better Text-Image Consistency Metric in Text-to-Image Generation

Zhaorui Tan , Zihan Ye , Qiufeng Wang , Yuyao Yan , Anh Nguyen , Xi Yang , Kaizhu Huang

分类：计算机视觉

2022-10-27

Generating consistent and high-quality images from given texts is essential for visual-language understanding. Although impressive results have been achieved in generating high-quality images, text-image consistency is still a major concern in existing GAN-based methods. Particularly, the most popular metric $R$-precision may not accurately reflect the text-image consistency, often resulting in very misleading semantics in the generated images. Albeit its significance, how to design a better text-image consistency metric surprisingly remains under-explored in the community. In this paper, we make a further step forward to develop a novel CLIP-based metric termed as Semantic Similarity Distance ($SSD$), which is both theoretically founded from a distributional viewpoint and empirically verified on benchmark datasets. Benefiting from the proposed metric, we further design the Parallel Deep Fusion Generative Adversarial Networks (PDF-GAN) that aims at improving text-image consistency by fusing semantic information at different granularities and capturing accurate semantics. Equipped with two novel plug-and-play components: Hard-Negative Sentence Constructor and Semantic Projection, the proposed PDF-GAN can mitigate inconsistent semantics and bridge the text-image semantic gap. A series of experiments show that, as opposed to current state-of-the-art methods, our PDF-GAN can lead to significantly better text-image consistency while maintaining decent image quality on the CUB and COCO datasets.

translated by 谷歌翻译

Masked Imitation Learning: Discovering Environment-Invariant Modalities in Multimodal Demonstrations

Yilun Hao , Ruinan Wang , Zhangjie Cao , Zihan Wang , Yuchen Cui , Dorsa Sadigh

分类：机器学习 | 机器人

2022-09-16

多模式演示为机器人提供了大量信息，以使世界有意义。但是，当从人类示威中学习感觉运动控制政策时，这种丰度可能并不总是会导致良好的表现。无关的数据模式可能导致状态过度规格，在该状态中包含的模式不仅可以在决策中无用，而且可以改变跨环境的数据分布。州过度规格会导致诸如学习的政策之类的问题，而不是在培训数据分布之外推广。在这项工作中，我们提出了掩盖的模仿学习（MIL），以选择性地使用信息方式来解决状态过度指定。具体来说，我们设计了带有二进制掩码的蒙版策略网络，以阻止某些方式。我们开发了一种双层优化算法，该算法可以学习此面具以准确过滤过度指定的模态。我们从经验上证明，使用Robomimic数据集在包括Mujoco和机器人ARM环境在内的模拟域中的基线算法均优于基线算法，并有效地在收集在真实机器人上收集的多模式数据集中有效地恢复了环境不变的模式。我们的项目网站在以下网址介绍了我们的结果的补充详细信息和视频：https：//tinyurl.com/masked-il

translated by 谷歌翻译

M^4I: Multi-modal Models Membership Inference

Pingyi Hu , Zihan Wang , Ruoxi Sun , Hu Wang , Minhui Xue

分类：机器学习

2022-09-15

随着机器学习技术的发展，研究的注意力已从单模式学习转变为多模式学习，因为现实世界中的数据以不同的方式存在。但是，多模式模型通常比单模式模型具有更多的信息，并且通常将其应用于敏感情况，例如医疗报告生成或疾病鉴定。与针对机器学习分类器的现有会员推断相比，我们关注的是多模式模型的输入和输出的问题，例如不同的模式，例如图像字幕。这项工作通过成员推理攻击的角度研究了多模式模型的隐私泄漏，这是确定数据记录是否涉及模型培训过程的过程。为了实现这一目标，我们提出了多种模型的成员资格推理（M^4i），分别使用两种攻击方法来推断成员身份状态，分别为基于公表示的（MB）M^4i和基于特征（FB）M^4i。更具体地说，MB M^4i在攻击时采用相似性指标来推断目标数据成员资格。 FB M^4i使用预先训练的阴影多模式提取器来通过比较提取的输入和输出功能的相似性来实现数据推理攻击的目的。广泛的实验结果表明，两种攻击方法都可以实现强大的性能。在不受限制的情况下，平均可以获得攻击成功率的72.5％和94.83％。此外，我们评估了针对我们的攻击的多种防御机制。 M^4i攻击的源代码可在https://github.com/multimodalmi/multimodal-membership-inference.git上公开获得。

translated by 谷歌翻译

Deep Learning Assisted Optimization for 3D Reconstruction from Single 2D Line Drawings

Zheng Jia , Zhu Yifan , Wang Kehan , Zou Qiang , Zhou Zihan

分类：计算机视觉

2022-09-06

在本文中，我们重新审视了从单线图中自动重建3D对象的长期问题。以前的基于优化的方法可以生成紧凑而准确的3D模型，但是它们的成功率在很大程度上取决于（i）确定一组真正的真正几何约束的能力，以及（ii）为数值优化选择一个良好的初始值。鉴于这些挑战，我们建议训练深层神经网络，以检测3D对象中几何实体（即边缘）之间的成对关系，并预测顶点的初始深度值。我们在大型CAD模型数据集上进行的实验表明，通过利用几何约束解决管道中的深度学习，基于优化的3D重建的成功率可以显着提高。

translated by 谷歌翻译

What Does the Gradient Tell When Attacking the Graph Structure

Zihan Liu , Ge Wang , Yun Luo , Stan Z. Li

分类：机器学习 | 人工智能

2022-08-26

最近的研究证明，图形神经网络容易受到对抗性攻击的影响。攻击者可以仅依靠培训标签来破坏Edge扰动不可知论受害者模型的性能。研究人员观察到，基于显着性的攻击者倾向于添加边缘而不是删除它们，这是通过以下事实来解释的：添加边缘通过聚集来污染节点的特征，同时删除边缘只会导致一些信息丢失。在本文中，我们进一步证明了攻击者通过添加类间边缘来扰动图，这也表现为降低扰动图的同层。从这个角度来看，基于显着的攻击者仍然有提高能力和不可识别的空间。基于GNN的替代模型的消息传递导致通过类间边缘连接的节点的过度厚度，从而阻止了攻击者获得节点特征的独特性。为了解决此问题，我们引入了一个多跳的汇总消息传递，以保留节点之间的属性差异。此外，我们提出了一个正规化术语来限制同质方差，以增强攻击不可识别。实验验证我们提出的替代模型改善了攻击者的多功能性，正则化项有助于限制扰动图的同质性。

translated by 谷歌翻译